Here, we’re just setting a few options.

knitr::opts_chunk$set(
  warning = TRUE, # show warnings during codebook generation
  message = TRUE, # show messages during codebook generation
  error = TRUE, # do not interrupt codebook generation in case of errors,
                # usually better for debugging
  echo = TRUE  # show R code
)
ggplot2::theme_set(ggplot2::theme_bw())

R Session Info

sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.29       magrittr_1.5     tidyselect_1.1.0 munsell_0.5.0   
##  [5] colorspace_1.4-1 R6_2.4.1         rlang_0.4.6      stringr_1.4.0   
##  [9] dplyr_1.0.0      tools_4.0.1      grid_4.0.1       gtable_0.3.0    
## [13] xfun_0.15        htmltools_0.5.0  ellipsis_0.3.1   yaml_2.2.1      
## [17] digest_0.6.25    tibble_3.0.1     lifecycle_0.2.0  crayon_1.3.4    
## [21] purrr_0.3.4      ggplot2_3.3.2    vctrs_0.3.1      glue_1.4.1      
## [25] evaluate_0.14    rmarkdown_2.3    stringi_1.4.6    compiler_4.0.1  
## [29] pillar_1.4.4     generics_0.0.2   scales_1.1.1     pkgconfig_2.0.3

Now, we’re preparing our data for the codebook.

library(codebook)
codebook_data <- mtcars

# omit the following lines, if your missing values are already properly labelled
codebook_data <- detect_missing(codebook_data,
    only_labelled = TRUE, # only labelled values are autodetected as
                                   # missing
    negative_values_are_missing = FALSE, # negative values are missing values
    ninety_nine_problems = TRUE,   # 99/999 are missing values, if they
                                   # are more than 5 MAD from the median
    )
# add variable descriptions
var_label(codebook_data) <- list(
  mpg = "Miles/(US) gallon.",
  cyl = "Number of cylinders.",
  disp = "Displacement (cu.in.).",
  hp = "Gross horsepower.",
  drat = "Rear axle ratio.",
  wt = "Weight (1000 lbs).",
  qsec = "1/4 mile time.",
  vs = "Engine shape.",
  am = "Transmission type.",
  gear = "Number of forward gears.",
  carb = "Number of carburetors."
)
val_labels(codebook_data$vs) <- c("V-shaped" = 0, "straight" = 1)
val_labels(codebook_data$am) <- c("automatic" = 0, "manual" = 1)
# Name of the dataset
metadata(codebook_data)$name <- "`mtcars` example dataset from the datasets package"

# description of the dataset
metadata(codebook_data)$description <- "The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)."

# when was the data collected: ideally in ISO 8601 format
metadata(codebook_data)$temporalCoverage <- "1973/1974"

metadata(codebook_data)$citation <- "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411."

You can find other useful pieces of metadata which might help other make use of your data in the future at https://schema.org/Dataset

Create codebook

codebook(codebook_data)
## No missing values.

Metadata

Description

Dataset name: mtcars example dataset from the datasets package

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Metadata for search engines

  • Temporal Coverage: 1973/1974

  • Citation: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

  • Date published: 2020-07-26

x
mpg
cyl
disp
hp
drat
wt
qsec
vs
am
gear
carb

#Variables

mpg

Miles/(US) gallon.

Distribution

Distribution of values for mpg

Distribution of values for mpg

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
mpg Miles/(US) gallon. numeric 0 1 10 19 34 20.09062 6.026948 <U+2583><U+2587><U+2585><U+2581><U+2582>

cyl

Number of cylinders.

Distribution

Distribution of values for cyl

Distribution of values for cyl

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
cyl Number of cylinders. numeric 0 1 4 6 8 6.1875 1.785922 <U+2586><U+2581><U+2583><U+2581><U+2587>

disp

Displacement (cu.in.).

Distribution

Distribution of values for disp

Distribution of values for disp

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
disp Displacement (cu.in.). numeric 0 1 71 196 472 230.7219 123.9387 <U+2587><U+2583><U+2583><U+2583><U+2582>

hp

Gross horsepower.

Distribution

Distribution of values for hp

Distribution of values for hp

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
hp Gross horsepower. numeric 0 1 52 123 335 146.6875 68.56287 <U+2587><U+2587><U+2586><U+2583><U+2581>

drat

Rear axle ratio.

Distribution

Distribution of values for drat

Distribution of values for drat

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
drat Rear axle ratio. numeric 0 1 2.8 3.7 4.9 3.596563 0.5346787 <U+2587><U+2583><U+2587><U+2585><U+2581>

wt

Weight (1000 lbs).

Distribution

Distribution of values for wt

Distribution of values for wt

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
wt Weight (1000 lbs). numeric 0 1 1.5 3.3 5.4 3.21725 0.9784574 <U+2583><U+2583><U+2587><U+2581><U+2582>

qsec

1/4 mile time.

Distribution

Distribution of values for qsec

Distribution of values for qsec

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
qsec 1/4 mile time. numeric 0 1 14 18 23 17.84875 1.786943 <U+2583><U+2587><U+2587><U+2582><U+2581>

vs

Engine shape.

Distribution

Distribution of values for vs

Distribution of values for vs

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
vs Engine shape. haven_labelled 0 1 0 0 1 0.4375 0.5040161 2 <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2586>

Value labels

Response choices
name value
V-shaped 0
straight 1

am

Transmission type.

Distribution

Distribution of values for am

Distribution of values for am

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
am Transmission type. haven_labelled 0 1 0 0 1 0.40625 0.4989909 2 <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2586>

Value labels

Response choices
name value
automatic 0
manual 1

gear

Number of forward gears.

Distribution

Distribution of values for gear

Distribution of values for gear

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
gear Number of forward gears. numeric 0 1 3 4 5 3.6875 0.7378041 <U+2587><U+2581><U+2586><U+2581><U+2582>

carb

Number of carburetors.

Distribution

Distribution of values for carb

Distribution of values for carb

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
carb Number of carburetors. numeric 0 1 1 2 8 2.8125 1.6152 <U+2587><U+2582><U+2585><U+2581><U+2581>

Missingness report

Codebook table

JSON-LD metadata The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.

{
  "name": "`mtcars` example dataset from the datasets package",
  "description": "The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name |label                    | n_missing|\n|:----|:------------------------|---------:|\n|mpg  |Miles/(US) gallon.       |         0|\n|cyl  |Number of cylinders.     |         0|\n|disp |Displacement (cu.in.).   |         0|\n|hp   |Gross horsepower.        |         0|\n|drat |Rear axle ratio.         |         0|\n|wt   |Weight (1000 lbs).       |         0|\n|qsec |1/4 mile time.           |         0|\n|vs   |Engine shape.            |         0|\n|am   |Transmission type.       |         0|\n|gear |Number of forward gears. |         0|\n|carb |Number of carburetors.   |         0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.2).",
  "temporalCoverage": "1973/1974",
  "citation": "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.",
  "datePublished": "2020-07-26",
  "keywords": ["mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"],
  "@context": "http://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "name": "mpg",
      "description": "Miles/(US) gallon.",
      "@type": "propertyValue"
    },
    {
      "name": "cyl",
      "description": "Number of cylinders.",
      "@type": "propertyValue"
    },
    {
      "name": "disp",
      "description": "Displacement (cu.in.).",
      "@type": "propertyValue"
    },
    {
      "name": "hp",
      "description": "Gross horsepower.",
      "@type": "propertyValue"
    },
    {
      "name": "drat",
      "description": "Rear axle ratio.",
      "@type": "propertyValue"
    },
    {
      "name": "wt",
      "description": "Weight (1000 lbs).",
      "@type": "propertyValue"
    },
    {
      "name": "qsec",
      "description": "1/4 mile time.",
      "@type": "propertyValue"
    },
    {
      "name": "vs",
      "description": "Engine shape.",
      "value": "0. V-shaped,\n1. straight",
      "maxValue": 1,
      "minValue": 0,
      "@type": "propertyValue"
    },
    {
      "name": "am",
      "description": "Transmission type.",
      "value": "0. automatic,\n1. manual",
      "maxValue": 1,
      "minValue": 0,
      "@type": "propertyValue"
    },
    {
      "name": "gear",
      "description": "Number of forward gears.",
      "@type": "propertyValue"
    },
    {
      "name": "carb",
      "description": "Number of carburetors.",
      "@type": "propertyValue"
    }
  ]
}`